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Abstract 

We construct a gene network based on expression data from DNA mi- 
croarray experiments, by establishing a link between two genes whenever 
the Pearson's correlation coefficient between their expression profiles is 
higher than a certain cutoff. The resulting connectivity distribution is 
compatible with a power- law decay with exponent 7 ~ 1, corrected by 
an exponential cutoff at large connectivity. The biological relevance of 
such network is demonstrated by showing that there is a strong statisti- 
cal correlation between high connectivity number and lethality: in close 
analogy to what happens for protein interaction networks, essential genes 
are strongly overerpresented among the hubs of the network, that is the 
genes connected to many other genes. 

DNA microarray experiments are one of the most powerful tools for studying in- 
teractions between genes on the scale of the whole genome. It is widely believed 
that a huge amount of biologically relevant information is encoded in the results 
of such experiments, and that new analytical methods need to be developed to 
extract it. 

In this work we propose to analyse the expression data obtained in microar- 
ray experiments by constructing a network of coregulated genes: the genes are 
the nodes of the network, and a link is established between two nodes whenever 
they are similarly expressed across many experimental conditions. On one hand, 
we show that such network, like many other known networks of self-organizing 
origin, shows a connectivity distribution that decays with a power law corrected, 
for large values of the connectivity, by an exponential cutoff [[j], ^ . On the other 
hand, we show that the network encodes biologically relevant information in 
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its topology: exactly as in the case of the protein interaction network in yeast 
centrality is strongly correlated to lethality. In other words, among the 
genes that have the highest connectivity in the network, essential genes, whose 
deletion produces an inviable mutant, are strongly overrepresented. 

We will work on yeast (S. cerevisiae), and use the expression data made pub- 
licly available by the authors of Ref. 0] (and that include also the data obtained 
by the authors of Ref. Q), who performed a series of microarray experiments 
with the goal of identifying cell-cycle regulated genes. The data consist in the 
expression profiles for virtually all of the ~ 6200 yeast genes across a total of 
77 timepoints. 

The network is constructed by the following procedure: 

1. To each gene we associate its expression profile defined as a string of 77 
real numbers, representing as it is customary the log2 of the ratio between 
expression (quantity of mRNA) at the given time-point and a reference 
value of the expression. The data come from different experiments and 
have been processed by the authors of Ref. Q] , to which we refer for details, 
so as to be comparable to each other across the various experiments. 

2. Missing values are replaced with the average expression over the available 
timepoints. To prevent this manipulation from having a sizable effect on 
the construction of the network, we retain only those genes for which at 
least 70 timepoints out of 77 are available. We are thus left with 5293 
genes as the nodes of our network. 

3. We compute the Pearson's correlation coefficient r for all pairs of nodes 
in the network. 

4. We establish a link between two nodes whenever r is larger than a certain 



The only free parameter in the procedure is the cutoff C. A possible way of 
choosing it is to compare the probability distribution of r for the actual data 
to the same distribution after the data have been randomized by shuffling the 
expression values of each gene. For the randomized data, no pair of genes shows 
a value of r greater than 0.67: therefore by choosing C = 0.67 the links we 
create can be considered of biological origin. A similar procedure was used in 
Ref.^J, where a network was constructed by establishing a link between two 
genes whenever the effects of their deletion on the expression of the rest of the 
genome were linearly correlated. 

With this choice of the cutoff C, 17643 links are established between the 
genes. Defining the connectivity k of a node as the number of links departing 
from it, we have an average connectivity (k) ~ 6.67. Defining N(k) as the 
number of nodes with connectivity k, we see that N(k) shows a long tail that 
reaches up to k = 173: the shape of the distribution is compatible with a power 
law decay with exponential cutoff: 
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Figure 1: Linear- log plot of N(k), the number of nodes with connectivity k, 
showing that the decay is slower than exponential. 



This is shown in Figs. 1-3: Fig. 1 shows N(k) in logarithmic scale as a func- 
tion of k (noise in the data has been reduced by logarithmic binning), showing 
that the decay of N(k) is slower than exponential for small to moderate values 
of k. Fig. 2 shows the same data with logarithmic scale on the k axis too, 
and demonstrates that the decay is faster, at large k, than the pure power law 
characteristic of scale-free networks. Finally Fig. 3 shows the data after correc- 
tion with the exponential cutoff, with k c ~ 38. The slope of the straight line 
is 7 = 0.95. An analysis of the cumulative distribution confirms these results. 
Interestingly, a recent study of the transcriptional regulation network in yeast 
also revealed a scale- free network with 7 ~ 1 ||. 

In this paper, we are mainly concerned with establishing the biological rele- 
vance of this network, independently of any of its graph-theoretic features. This 
we will do by showing that the nodes in the network with high connectivity are 
more likely to be essential genes, whose deletion produces an inviable mutant. 

A list of essential yeast genes is publicly available from the Saccharomyces 
Genome Deletion Project [gj, and comprises 1104 genes, corresponding to 18.7% 
of the genes deleted in the project. Of the 5293 genes in our network, 964 
(18.2%) are included in the list. Fig. 4 shows the fraction f(k min ) of essential 
genes among the genes having connectivity k m in or more, as a function of k m i n : 
it grows from 0.182 at fc m m = (by definition) to 1 for the the 6 most connected 
genes (k > 155). 

The figure shows that essential genes are more and more overrepresented 
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Figure 2: Log-log plot of N(k), showing a power-like decay with an exponential 
cutoff at large distances. 
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Figure 3: Log-log plot of N(k) after correction with the exponential cutoff at 
k c ~ 37. The slope of the straight line is ~ 0.95. 
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Figure 4: The fraction f(k m in) of essential genes among the ones with connec- 
tivity k > kmin- 

as the minimum connectivity k m i n is increased. To evaluate the statistical 
significance of such overrepresentation, suppose that the number of genes with 
connectivity k > k m in is n, and that among these m are essential. Then one can 
evaluate the probability P(N, M;n,m) that, out of n randomly chosen nodes 
out of a set of N, m or more are essential genes, when the total number of 
essential genes is M. This probability can be computed as the right tail of 
the appropriate hypergeometric distribution, and reaches very small values: for 
example the fraction of essential genes reaches 50% for k m i n = 37, with m = 127 
essential genes out of n — 251 nodes, and the probability P{N = 5293, M — 
964; n = 251, m = 127) is about 4 • 1(T 33 . 

In conclusion, we have built a gene network based on expression data ob- 
tained with DNA microarray experiment, by joining genes showing similar ex- 
pression profiles. The resulting network shows a power law decay of the connec- 
tivity distribution with an exponential cutoff, and exponent 7 ~ 1. Its biological 
relevance is proved by the strong statistical correlation between centrality and 
lethality. 
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